A semi-supervised learning method to classify grant-support zone in web-based medical articles
نویسندگان
چکیده
Traditional classifiers are trained from labeled data only. Labeled samples are often expensive to obtain, while unlabeled data are abundant. Semi-supervised learning can therefore be of great value by using both labeled and unlabeled data for training. We introduce a semi-supervised learning method named decision-directed approximation combined with Support Vector Machines to detect zones containing information on grant support (a type of bibliographic data) from online medical journal articles. We analyzed the performance of our model using different sizes of unlabeled samples, and demonstrated that our proposed rules are effective to boost classification accuracy. The experimental results show that the decision-directed approximation method with SVM improves the classification accuracy when a small amount of labeled data is used in conjunction with unlabeled data to train the SVM.
منابع مشابه
Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملA Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion
This paper describes a machine learning approach for detecting web spam. Each example in this classification task corresponds to 100 web pages from a host and the task is to predict whether this collection of pages represents spam or not. This task is part of the 2007 ECML/PKDD Graph Labeling Workshop’s Web Spam Challenge (track 2). Our approach begins by adding several human-engineered feature...
متن کاملImproving Generalization for Polyphonic Piano Transcription
In this paper, we present methods to improve the generalization capabilities of a classification-based approach to polyphonic piano transcription. Support vector machines trained on spectral features are used to classify frame-level note instances, and the independent classifications are temporally constrained via hidden Markov model post-processing. Semi-supervised learning and multiconditioni...
متن کاملA density based clustering approach to distinguish between web robot and human requests to a web server
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...
متن کامل